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Foreword 



rd , 



This Technical Specification (TS) has been produced by the ETSI 3 Generation Partnership Project (3GPP). 

The present document may refer to technical specifications or reports using their 3GPP identities or GSM identities. 
These should be interpreted as being references to the corresponding ETSI deliverables. The mapping of document 
identities is as follows: 

For 3GPP documents: 

3G TS I TR nn.nnn "<title>" (with or without the prefix 3G) 

is equivalent to 

ETSI TS I TR Inn nnn "[Digital cellular telecommunications system (Phase 2+) (GSM);] Universal Mobile 
Telecommunications System; <title> 

For GSM document identities of type "GSM xx.yy", e.g. GSM 01.04, the corresponding ETSI document identity may be 
found in the Cross Reference List on www.etsi.org/kev 
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Foreword 



This Technical Specification has been produced by the 3GPP. 

The present document is an introduction to the speech processing parts of the narrowband telephony speech service 
employing the Adaptive Multi-Rate (AMR) speech coder within the 3GPP system. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying 
change of release date and an increase in version number as follows: 

Version 3.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 Indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 
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Scope 



The present document is an introduction to tine speecin processing parts of tine narrowband teleplnony 
speecin service employing tine Adaptive Multi-Rate (AMR) speech coder. A general overview of the speech 
processing functions is given, with reference to the documents where each function is specified in detail. 



2 Normative references 

This TS incorporates by dated and undated reference, provisions from other publications. These normative 
references are cited at the appropriate places in the text and the publications are listed hereafter. For dated 
references, subsequent amendments to or revisions of any of these publications apply to this TS only when 
incorporated in it by amendment or revision. For undated references, the latest edition of the publication 
referred to applies. 



[1] 



GSM 03.50 : "Digital cellular telecommunications system (Phase 2); Transmission 
planning aspects of the speech service in the GSM Public Land Mobile Network (PLMN) 
system". 



[2] 


3G TS 26.090 


[3] 


3G TS 26.073 


[4] 


3G TS 26.074 


[5] 


3G TS 26.093 


[6] 


3G TS 26.094 


[7] 


3G TS 26.092 


[8] 


3GTS 26.091 


[9] 


3GTS 26.101 


[10] 


3GTS 26.102 


[11] 


TS 26.901 : "A 



"AMR Speech Codec 
"AMR Speech Codec 
"AMR Speech Codec 
"AMR Speech Codec 
"AMR Speech Codec 
"AMR Speech Codec 
"AMR Speech Codec 
"AMR Speech Codec 
"AMR Speech Codec 



Transcoding functions". 

ANSI-C code". 

Test sequences". 

Source Controlled Rate operation". 

Voice Activity Detection (VAD)". 

Comfort Noise Aspects". 

Error Concealment of Lost Frames. 

Frame Structure". 

Interface to RAN". 



TS 26.901 : "AMR Speech Codec; Performance characterisation". 



3.1 



Definitions and abbreviations 



Abbreviations 



For the purposes of this TS, the following abbreviations apply: 

ACELP Algebraic Code Excited Linear Prediction 

AMR Adaptive Multi-Rate 

BFI Bad Frame Indication 

CHD Channel Decoder 

CHE Channel Encoder 

GSM Global System for Mobile communications 

ITU-T International Telecommunication Union - Telecommunication standardisation sector 

(former CCITT) 

PCM Pulse Code Modulation 

PLMN Public Land Mobile Network 

PSTN Public Switched Telephone Network 

RX Receive 

SCR Source Controlled Rate 
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SPD 


SPeech Decoder 


SPE 


SPeech Encoder 


TC 


Transcoder 


TX 


Transmit 


UE 


User Equipment (terminal) 



General 



Tine AMR speecin coder consists of tine multi-rate speech coder, a source controlled rate scheme including a 
voice activity detector and a comfort noise generation system, and an error concealment mechansim to 
combat the effects of transmission errors and lost packets. 

The multi-rate speech coder is a single integrated speech codec with eight source rates from 4.75 kbit/s to 
12.2 kbit/s, and a low rate background noise encoding mode. The speech coder is capable of switching its 
bit-rate every 20 ms speech frame upon command. 

A reference configuration where the various speech processing functions are identified is given in Figure 1 . 
In this figure, the relevant specifications for each function are also indicated. 

In Figure 1 , the audio parts including analogue to digital and digital to analogue conversion are included, to 
show the complete speech path between the audio input/output in the User Equipment (UE) and the digital 
interface of the network. The detailed specification of the audio parts is not within the scope of this 
document. These aspects are only considered to the extent that the performance of the audio parts affect the 
performance of the speech transcoder. 
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Figure 1 : Overview of audio processing functions. 

1 ) 8-bit A-law or ^-law PCM (ITU-T recommendation G.71 1 ), 8 000 samples/s 

2) 13-bit uniform PCM, 8 000 samples/s 

3) Voice Activity Detector (VAD) flag 

4) Encoded speech frame, 50 frames/s, number of bits/frame depending on the AMR codec mode 

5) Silence Descriptor (SID) frame. 

6) TX_TYPE, 2 bits, indicates whether information bits are available and if they are speech or SID 
information 

7) Information bits delivered to the 3G AN 

8) Information bits received from the 3G AN 

9) RXTYPE, the type of frame received quantized into three bits 
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5 Adaptive Multi-Rate speech codec transcoding 

functions 

The adaptive multi-rate speecin codec is described in [2]. Tine technical content is identical to that of GSM 
06.90. 

As shown in Figure 1 , the speech encoder takes its input as a 13-bit uniform Pulse Code Modulated (PCM) 
signal either from the audio part of the UE or on the network side, from the Public Switched Telephone 

Network (PSTN) via an 8-bit A-law or ^-law to 1 3-bit uniform PCM conversion. The encoded speech at the 

output of the speech encoder is packetized and delivered to the network interface. In the receive direction, 
the inverse operations take place. 

The detailed mapping between input blocks of 160 speech samples in 13-bit uniform PCM format to encoded 
blocks (in which the number of bits depends on the presently used codec mode) and from these to output 
blocks of 160 reconstructed speech samples is described in [2]. The coding scheme is Multi-Rate Algebraic 
Code Excited Linear Prediction. The bit-rates of the source codec are listed in Table 1 . 

An AMR speech codec capable UE shall support all source rates listed in Table 1 . 

Table 1 : Source codec bit-rates for the AMR codec. 



Codec mode 


Source codec bit-rate 


AMR_12.20 


12.20 kbit/s (GSM EFR) 


AMR_10.20 


10.20kbit/s 


AMR_7.95 


7.95 kbit/s 


AMR_7.40 


7.40 kbit/s (IS-641) 


AMR_6.70 


6.70 kbit/s (PDC-EFR) 


AMR_5.90 


5.90 kbit/s 


AMR_5.15 


5.15 kbit/s 


AMR_4.75 


4.75 kbit/s 


AMR_SID 


1 .80 kbit/s * 



(*) Assuming SID frames are continously transmitted 

NOTE 1 : GSM-EFR is the ETSI GSM 06.90 Enhanced Full Rate Speech Codec (also 

identical to the TIA TDMA-US1 Enhanced speech codec) 
NOTE 2: IS-641 is the TIA/EIA IS-641 TDMA Enhanced Full Rate Speech Codec 
NOTE 3: PDC-EFR is the ARIB 6.7 kbit/s Enhanced Full Rate Speech Codec 



6 Adaptive IVIulti-Rate speech codec ANSI C-code 

The ANSI-C code of the speech codec, VAD and CNG system are described in [3]. The ANSI C-code is 
mandatory. The ANSI C-code is identical to that of GSM 06.73. 



7 



Adaptive IVIulti-Rate speech codec test vectors 



A set of digital test sequences is specified in [4], thus enabling the verification of compliance, i.e. bit- 
exactness, to a high degree of confidence. The test vectors are identical to those of GSM 06.74. 

The test sequences are defined separately for: 

- The speech codec described in [2], 
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- The VAD described in [6] , 

- Tine CN generation described in [7] 

Tine adaptive multi-rate speecin transcoder, VAD, SCR system and comfort noise parts of tine audio 
processing functions (see Figure 1) are defined in bit exact aritlnmetic. Consequently, they shall react on a 
given input sequence always with the corresponding bit exact output sequence, provided that the internal 
state variables are also always exactly in the same state at the beginning of the test. 

The input test sequences provided shall force the corresponding output test sequences, provided that the 
tested modules are in their home-state when starting. 

The modules may be set into their home states by provoking the appropriate homing-functions. 

NOTE: This is normally done during reset (initialisation of the codec). 

Special inband signalling frames (encoder-homing-frame and decoder-homing-frame) described in [2]have 
been defined to provoke these homing-functions also in remotely placed modules. 

At the end of the first received homing frame, the audio functions that are defined in a bit exact way shall go 
into their predefined home states. The output corresponding to the first homing frame is dependent on the 
codec state when the frame was received. Any consecutive homing frames shall produce corresponding 
homing frames at the output. 

8 Adaptive Multi-Rate speech codec source controlled 

rate operation 

The source controlled rate operation of the adaptive multi-rate speech codec is defined in [5]. 

During a normal telephone conversation, the participants alternate so that, on the average, each direction of 
transmission is occupied about 50 % of the time. Source controlled rate (SCR) is a mode of operation where 
the speech encoder encodes speech frames containing only background noise with a lower bit-rate than 
normally used for encoding speech. A network may adapt its transmission scheme to take advantage of the 
varying bit-rate. This may be done for the following two purposes: 

1 ) In the UE, battery life will be prolonged or a smaller battery could be used for a given operational 
duration. 

2) The average required bit-rate is reduced, leading to a more efficient transmission with decreased load 
and hence increased capacity. 



The following functions are required for the source controlled rate operation: 

- a Voice Activity Detector (VAD) on the TX side; 

- evaluation of the background acoustic noise on the TX side, in order to transmit characteristic 
parameters to the RX side; 

- generation of comfort noise on the RX side during periods when no normal speech frames are 
received. 



The transmission of comfort noise information to the RX side is achieved by means of a Silence Descriptor 
(SID) frame, which is sent at regular intervals. 
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9 Adaptive Multi-Rate speech codec voice activity 
detection 

The adaptive multi-rate VAD function is described in [6]. 

Tine input to tine VAD is tine input speecin itself together with a set of parameters computed by the adaptive 
multi-rate speech encoder. The VAD uses this information to decide whether each 20 ms speech coder 
frame contains speech or not. 

The VAD algorithm is described in [6], and the corresponding C code is defined in [3]. The verification of 
compliance to [6]. is achieved by use of digital test sequences applied to the same interface as the test 
sequences for the speech codec. 

10 Adaptive IVIulti-Rate speech codec comfort noise 
insertion 

The adaptive multi-rate comfort noise insertion function is described in [7]. 

When speech is absent, the synthesis in the speech decoder is different from the case when normal speech 
frames are received. The synthesis of an artificial noise based on the received non-speech parameters is 
termed comfort noise generation. 

The comfort noise generation process is as follows: 

- the evaluation of the acoustic background noise in the transmitter; 

- the noise parameter encoding (SID frames) and decoding, and 

- the generation of comfort noise in the receiver. 



The comfort noise processes and the algorithm for updating the noise parameters during speech pauses are 
defined in detail in [7], and the corresponding C code is defined in [3]. The comfort noise mechanism is 
based on the adaptive multi-rate speech codec defined in [2]. 



1 1 Adaptive IVIulti-Rate speech codec error concealment 
of lost frames 

The adaptive multi-rate speech codec error concealment of lost frames is described in [8]. 

Frames may be lost due to transmission errors or frame stealing in a wireless environment. Actions which 
shall be taken in these cases, both for lost speech frames and for lost SID frames are described in [8]. Error 
concealment actions shall be used also in the case of lost speech packets in the transport network. The 
methods described in [8] may with some modifications be used as a basis for such actions. 

In order to mask the effect of isolated lost frames, the speech decoder shall be informed and the error 
concealment actions shall be initiated, whereby a set of predicted parameters are used in the speech 
synthesis. Insertion of speech signal independent silence frames is not allowed. For several subsequent lost 
frames, a muting technique shall be used to indicate to the listener that transmission has been interrupted. 
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12 Adaptive Multi-Rate speech codec frame structure 

The adaptive multi-rate speecin frame structure is described in [9]. Tine output interface format from tine 
encoder and input interface format to tine decoder is divided into two parts; tine core speecin data part, winicin 
is tine speecin coded bits, and tine otiner part is an additional data part with mode information. 

The interface format described in [9] is termed AMR interface format 1 (AMR IF1). 

Annex A of [9] describes an octet aligned frame format which shall be used in applications requiring octet 
alignment, such as for 3G H.324. This format is termed AMR interface format 2 (AMR IF2). 



13 Adaptive IVIulti-Rate speech codec interface to RAN 

The adaptive multi-rate speech service interface to RAN is described in [1 0]. 
[F.F.S] 

14 Adaptive IVIulti-Rate speech codec performance 
characterisation 

The adaptive multi-rate speech channel performance characterisation is described in [1 1]. 
[F.F.S.] 
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